1 Introduction

In this tutorial, we’ll explore how to use R Studio as a mapping application, by drawing on the tmap package (among other packages). In our applied example, we’ll focus on taking an indicator from the World Bank’s Development Indicators (a widely used collection of cross-national time-series data on economic development), and displaying this indicator on a world map. More specifically, we will be making a choropleth map; choropleths are only one of many different kinds of maps out there, but they are widely used in the social sciences, since so much of our data is connected with enumeration units (like census tracts, counties, countries etc.). For a good primer on choropleth maps, please see this guide.

In working through the example below, you will become familiar with the basic mechanics of a simple, yet fundamentally important GIS workflow: the process of joining a conventional tabular dataset (that is, a dataset without explicit spatial coordinates attached to it) to a spatial dataset (that is, an explicitly georeferenced dataset), with a view towards displaying information from the tabular dataset on the geographic boundaries delineated by the spatial dataset.

Why map the information in tabular datasets to begin with? To some extent, for the same reason we attempt to make a visualization of any kind. It can help to communicate information; it can help us identify patterns we wouldn’t have otherwise noticed; and it can suggest novel hypotheses. More specifically, making a map might alert us to the spatial dimensions of the social phenomenon we are investigating, which might affect our methodological or substantive approach to a given research problem. Just as good social scientists try to contextualize their social process or outcome of interest in time, it is also important to contextualize it in space; making a map is often the first step in this process.

It is important to underscore that this tutorial is not an introduction to cartography, which is an art unto itself, and beyond our scope. However, the tutorial will show you how to customize maps in R, and you can use these skills (after reading up on cartography on your own) to make maps that conform to sound cartographic design principles. Our goal here is not to make works of art, but rather to learn how to quickly create functional and informative choropleth maps that are useful to social scientists.

For more on principles of cartographic design, a good place to start is this accessible guide from Axis Maps. As you begin to make different types of maps (or more sophisticated ones, with more moving parts than a standard choropleth map), you may want to begin with these resources from Esri: Design Principles for Cartography and Make Maps People Want to Look At, both by Aileen Buckley.

2 Preliminaries

R is an open-source programming language for statistical computing that allows users to carry out a wide range of data analysis and visualization tasks (among other things). One of the big advantages of using R is that it has a very large user community among social scientists and statisticians, who frequently publish R packages. One might think of packages as workbooks of sorts, which contain a well-integrated set of R functions, scripts, data, and documentation; these “workbooks” are designed to facilitate certain tasks or implement given procedures. These packages are then shared with the broader community, and at this point, anyone who needs to accomplish the tasks to which the package addresses itself can use the package in the context of their own projects. The ability to use published packages considerably simplifies the work of applied social scientists using R; it means that they rarely have to write code entirely from scratch, and can build on the code that others have published in the form of packages. This allows applied researchers to focus on substantive problems, without having to get too bogged down in complicated programming tasks.

In the context of this tutorial, generating choropleth maps based on the World Development Indicators (WDI) data would be relatively complex if we had to write all our code from scratch. However, because we are able to make use of mapping and visualization packages written by other researchers, the task is considerably simpler, and will not require any complicated programming.

In order to process our data and make our maps, we will use a variety of packages. They are:

  • WDI: It is possible to download our WDI variable of choice directly from the World Bank’s WDI website (see the “Access Data” section), but there is an R package, entitled WDI, that allows users to directly query and extract WDI data without leaving R. Since this is more convenient than downloading the dataset externally and loading it into R, we will use the WDI package to extract the WDI series of our choice.
  • sf*: The sf package allows us to work with spatially explicit data in R. In particular, it allows us to work with “sf” objects. “sf” stands for “simple features”, and is a data structure that is able to store and display geospatial vector data. Vector data (along with raster) is one of the two main types of GIS data (along with raster data), and we’ll discuss it in more detail next class.
  • tmap: The tmap package will allow us to create and customize our map. It has become one of the major R mapping packages in use today. There are other package that facilitate mapping in R. One prominent alternative to tmap, which many of you may be familiar with, is ggplot2. However, while tmap uses very similar syntax to ggplot2, the former is a dedicated mapping package, while the latter is a more general-purpose visualization package. In my experience, tmap is slightly more intuitive and user-friendly, so we will use it instead of ggplot2 in this tutorial.
  • rnaturalearth and rnaturalearthdata: In order to visualize our data on a world map, we need a spatial dataset of the world’s country boundaries. One way to get such a dataset is to download it from a public repository, and then load it into R Studio. However, these packages allow us to load a spatial dataset of world boundaries into R Studio as sf objects without having to actually download anything, which effectively saves us a few steps in the workflow.
  • tidyverse: The tidyverse is a suite of data-science packages (ggplot2, mentioned above, is actually a part of the tidyverse) that provide useful functions to implement common data science/data analysis tasks. We’ll use some tidyverse packages to clean up our data, subset data implement table joins etc.
  • grid: The grid package allows us to customize the layout of our maps, and in particular, make inset maps (an inset map is a smaller map embedded within the frame of the main map; it can be used to contextualize a given map, or to provide a “zoomed in” view of a particularly important part of the main map, or a part of the main map that is difficult to see).

To install a package in R, pass the name of the package (within quotation marks) to the install.packages() function. For example, let’s say you don’t have tmap installed. You can install it with the following:

# Installs tmap packages
install.packages("tmap")

A function is essentially a programming construct that takes a specified input, runs this input (called an “argument”) through a set of procedures, and returns an output. In the code block above, the name of the package we wanted to install (here, “tmap”) was enclosed within quotation marks and passed as an argument to install.packages; this effectively downloaded the tmap package to your computer.

Repeat that process for any packages you don’t have installed.

After all the packages are downloaded, we must load them into memory. We can think of the process of loading installed packages into a current R environment as analogous to opening up an application on your phone or computer after it has been installed (even after an application has been installed, you can’t use it until you open it!). To load (i.e. “open”) an R package, we pass the name of the package we want to load as an argument to the library() function. Below, we load all of the required packages into memory:

# Loads required libraries
library(WDI)
library(sf)
library(tmap)
library(rnaturalearth)
library(rnaturalearthdata)
library(tidyverse)
library(grid)

At this point, the packages are loaded and ready to go! One important thing to note regarding the installation and loading of packages is that we only have to install packages once; after a package is installed, there is no need to subsequently reinstall it. However, we must load the packages we need (using the library function) every time we open a new R session. In other words, if we were to close R Studio at this point and open it up later, we would not need to install these packages again, but would need to load the packages again (3.5).

Note that the codeblocks in this tutorial usually have comments, prefaced by a hash (“#”). When writing code in R (or any other command-line interface) it is good practice to preface one’s code with brief comments that describe what a block of code is doing. Writing these comments can allow someone else (or your future self) to read and quickly understand the code more easily than otherwise might be the case. The hash before the comment effectively tells R that the subsequent text is a comment, and should be ignored when running a script If one does not preface the comment with a hash, R wouldn’t know to ignore the comment, and would throw an error message.

3 Load Required Data

Now that we’ve taken care of these preliminary steps, let’s go ahead and load our data into R Studio. Below, we’ll first load in a spatial dataset of world boundaries, and then read in our World Bank dataset using the WDI package.

Before proceeding, however, it is useful to briefly consider the concept of object asssignment, which will make the subsequent sections easier to follow. Consider the following example:

# assign value 5 to new object named x
x<-5

In the code above, we used R’s assignment operator (<-, i.e. a left-facing arrow) to assign value 5 to an object named “x.” Now that an object named “x” has been created and assigned the value 5, printing “x” in our console (or printing “x” in our script and running it) will return the value 5:

# Print value assigned to object "x"
x
## [1] 5

More generally, the process of assignment effectively equates the output created by the code on the right side of the assignment operator (<-) to an object with a name that is specified on the left side of the assignment operator. Whenever we want to look at the value assigned to an object (i.e. the output created by the code to the right side of the assignment operator), we simply print the name of the object in the R console (or print the name and run it within a script).

While the example above was very simple, we can assign virtually any R code, and by extension, the data structure(s) generated by that code (such as datasets, maps, graphs) to an R object. Indeed, we’ll use the basic principle of object assignment introduced above to assign the datasets we’ll import below to new objects. Note that object names are arbitrary and could be virtually anything, but it is good practice for object names to describe their contents. If this is new, it will begin to make more sense as we go.

3.1 Load and process the spatial dataset of country boundaries

When working with spatial data in R, we will sometimes want to import data that is stored on our computer. There are several functions in the sf package that will allow us to easily import saved or downloaded spatial data into R; the most commonly used sf package function to load saved spatial vector data into R is the st_read() function. For more details, please consult the st_read() function’s documentation by typing ?st_read().

In our case, however, we won’t have to download and import the spatial data we need into R Studio from our computer’s local drive. That is because there are R packages that already provide this spatial data, and allow us to directly load it into memory. In particular, we’ll use the ne_countries() function of the rnaturalearth package to bring a spatial dataset of country borders into our R environment, and then assign it to an object that we will name country_boundaries:

# Brings spatial dataset of country boundaries into R environment using the rnaturalearth package, and then assigns this spatial dataset to an object named "country_boundaries"
country_boundaries<-ne_countries(scale="medium", returnclass="sf")

Note the two arguments we pass to the ne_countries function: the “scale” argument specifies that we want to use a medium scale when rendering the map (the other options are ‘small’ and ‘large’), while the “returnclass” argument specifies that we want the spatial dataset as an sf object.

Now that we have our spatial dataset loaded into our R environment and assigned to country_boundaries, let’s open up this dataset and see what it looks like. The best way to view a dataset in R studio is to pass the name of the relevant object to the View() function, which will open up the dataset in R Studio’s built-in data viewer.

# View "country_boundaries" data in R Studio Data Viewer
View(country_boundaries)

By scrolling across the dataset, you’ll note that each row corresponds to a country, and that there are many columns that correspond to various country-level attributes. The crucial column, however, which makes this a spatial dataset (as opposed to merely a tabular one), is the information contained in a column called “geometry”. This column contains geographic coordinate information that essentially defines a polygon for each country in the dataset. Note that the “geometry” column is likely one of the last columns in dataset, so you may have to scroll a bit to find it.

To observe the information in the “geometry” column more clearly, we can extract that specific column. The dollar sign ($) is the R operator that allows us to extract a specified column; below, we are extracting the “geometry” column from the dataset assigned to the country_boundaries object:

# Extracts "geometry" column from country_boundaries
country_boundaries$geometry
## Geometry set for 241 features 
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -89.99893 xmax: 180 ymax: 83.59961
## CRS:            +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## First 5 geometries:
## MULTIPOLYGON (((-69.89912 12.452, -69.8957 12.4...
## MULTIPOLYGON (((74.89131 37.23164, 74.84023 37....
## MULTIPOLYGON (((14.19082 -5.875977, 14.39863 -5...
## MULTIPOLYGON (((-63.00122 18.22178, -63.16001 1...
## MULTIPOLYGON (((20.06396 42.54727, 20.10352 42....

Note that extracting the “geometry” column prints some useful metadata; it tells us that the dataset has 241 features, and that it represents spatial information as polygons (geometry type: MULTIPOLYGON). It also provides information on the datasets bounding box (“bbox”) and coordinate reference system (“CRS”). We’ll unpack this information a bit more next class, but for now, what is important to notice is that the “geometry” column is comprised of multiple geographic coordinates for each row; we can use this information in the “geometry” column to draw georeferenced polygons for each row in the spatial dataset, which will yield a world map!

To translate the information in the “geometry” column of the dataset into a cartographic representation, we’ll use the tmap package. In particular, we’ll use the tm_shape and tm_polygons functions from tmap, which are connected by a plus sign (+). The argument passed to the tm_shape function is the name of the object associated with the spatial dataset (country_boundaries, defined above). In addition, the tm_polygons function indicates that the spatial data is to be represented using polygons (as opposed to alternatives such as lines or points), and does not require any arguments (we’ll add some arguments in just a bit). When we type in and run the following code from our script, the result is a map that is rendered based on the information in the “geometry” column of country_boundaries:

# maps geographic features (i.e. countries) of "country_boundaries" using tmap package functions
tm_shape(country_boundaries)+
  tm_polygons()

If you don’t like the grey polygons, you can specify a desired color within the tm_polygons() function. For guidance on working with colors in R (including information on color and palette codes), see this extremely useful R Color Cheatsheet, by Melanie Frazier.

For example, let’s say we want to draw the polygons in the color associated with “darkorange” on the cheat sheet. We can use the following:

# maps geographic features (i.e. countries) of "country_boundaries" using tmap package functions; polygons rendered in "darkorange"
tm_shape(country_boundaries)+
  tm_polygons("darkorange")

Or, say we prefer the color associated with the label “cadetblue2”:

# Maps country polygons from "country_boundaries" in "cyan1"
tm_shape(country_boundaries)+
  tm_polygons("cadetblue2")

Just as we can assign datasets or numeric values to objects, so too with maps. For example, let’s say we want to assign the orange world map we generated above to an object named world_map_orange:

world_map_orange<-tm_shape(country_boundaries)+
                      tm_polygons("darkorange")

Now, whenever we want to bring up that particular map, we can simply print the name of the object, and the map will render in the “Plots” tab of the R Studio interface (on the bottom-right of the screen):

# prints contents of "world_map_orange"
world_map_orange

remove frame remove antarctica web map

3.2 View and extract WDI codes

WDI_variables<-WDIsearch()
View(WDI_variables)
WDI_variables %>% datatable(extensions=c("Scroller", "FixedColumns"), options = list(
  deferRender = TRUE,
  scrollY = 350,
  scrollX = 350,
  dom = "t",
  scroller = TRUE,
  fixedColumns = list(leftColumns = 3)
))
trade_gdp_string<-"NE.TRD.GNFS.ZS"

3.3 Extract WDI data

trade_gdp_2010_2018<-WDI(country="all",
                                indicator=trade_gdp_string,
                                start=2010,
                                end=2018,
                                extra=T)
trade_gdp_2015<-trade_gdp_2010_2018 %>% 
                  filter(year=="2015")
View(trade_gdp_2015)

3.4 Join WDI data to spatial dataset

trade_2015_spatial<-full_join(country_boundaries, trade_gdp_2015,
                                    by=c("iso_a3"="iso3c"))
View(trade_2015_spatial)
## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html
trade_2015_spatial<-trade_2015_spatial %>% 
                      rename(trade_pct_gdp=NE.TRD.GNFS.ZS)

3.5 Make map of WDI data

tm_shape(trade_2015_spatial)+
  tm_polygons(col="trade_pct_gdp", palette="YlOrRd", style="jenks",
              n=5)
## Warning: The shape trade_2015_spatial contains empty units.

## Make map with inset map

unique(trade_2015_spatial$subregion)
##  [1] "Caribbean"                 "Southern Asia"            
##  [3] "Middle Africa"             "Southern Europe"          
##  [5] "Northern Europe"           "Western Asia"             
##  [7] "South America"             "Polynesia"                
##  [9] "Antarctica"                "Australia and New Zealand"
## [11] "Seven seas (open ocean)"   "Western Europe"           
## [13] "Eastern Africa"            "Western Africa"           
## [15] "Eastern Europe"            "Central America"          
## [17] "Northern America"          "South-Eastern Asia"       
## [19] "Southern Africa"           "Eastern Asia"             
## [21] "Northern Africa"           "Melanesia"                
## [23] "Micronesia"                "Central Asia"             
## [25] NA
trade_2015_spatial_sea<-trade_2015_spatial %>% 
                            filter(subregion=="South-Eastern Asia")

y<-tm_shape(trade_2015_spatial_sea)+
  tm_polygons(col="trade_pct_gdp", palette="YlOrRd", style="jenks",
              n=5)

y

library(grid)
x<-tm_shape(trade_2015_spatial)+
  tm_polygons(col="trade_pct_gdp", palette="YlOrRd", style="jenks",
              n=5)
x
## Warning: The shape trade_2015_spatial contains empty units.
print(y, vp = viewport(0.8, 0.27, width = 0.5, height = 0.5))

library(grid)
x<-tm_shape(trade_2015_spatial)+
  tm_polygons(col="trade_pct_gdp", palette="YlOrRd", style="jenks",
              n=5)
x
## Warning: The shape trade_2015_spatial contains empty units.
print(y, vp = viewport(0.8, 0.18, width = 0.3, height = 0.3))
## Legend labels were too wide. The labels have been resized to 0.45, 0.45, 0.41, 0.38, 0.38. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.

## Map change over time

trade_gdp_2010_2015<-trade_gdp_2010_2018 %>% 
                      filter(year=="2010"|year=="2015")
trade_gdp_2010_2015_wide<-trade_gdp_2010_2015 %>% 
                          pivot_wider(names_from=year, values_from=c(NE.TRD.GNFS.ZS))
trade_gdp_2010_2015_wide<-trade_gdp_2010_2015_wide %>% 
                            rename("trade_gdp_2015"="2015") %>% 
                            rename("trade_gdp_2010"="2010")
trade_gdp_2010_2015_wide<-trade_gdp_2010_2015_wide %>% 
                          mutate("2015_2010_pctpt_difference"=(trade_gdp_2015-trade_gdp_2010))
trade_gdp_2010_2015_wide_spatial<-full_join(country_boundaries, trade_gdp_2010_2015_wide,
                                    by=c("iso_a3"="iso3c"))
tm_shape(trade_gdp_2010_2015_wide_spatial)+
  tm_polygons(col="2015_2010_pctpt_difference", palette="Greens", style="jenks",
              n=5, midpoint=T)
## Warning: The shape trade_gdp_2010_2015_wide_spatial contains empty units.

https://ecodiv.earth/post/creating-a-map-with-inset-using-tmap/

4 Country-Level Map

5 USA Local Map